Lets understand this with an example. We can have time series which have many other time series nested inside it, we categorize these into 3 categories
For example : Sale of Hybrid bikes can be divided into city, commuting, comfort, various types of bikes.
As we can see from the above equations, that if we sum up the bottom level of the hierarchy and compare it with a level up, it will be the same
Let us understand this with an example from Australian tourism This data has, quarterly domestic tourism demand, measured as the number of overnight trips Australians spend away from the home.
library('fpp3')
## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
## ✔ tibble 3.1.8 ✔ tsibble 1.1.3
## ✔ dplyr 1.0.10 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.2.1 ✔ feasts 0.3.0
## ✔ lubridate 1.9.0 ✔ fable 0.3.2
## ✔ ggplot2 3.4.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
tourism <- tsibble::tourism |>
mutate(State = recode(State,
`New South Wales` = "NSW",
`Northern Territory` = "NT",
`Queensland` = "QLD",
`South Australia` = "SA",
`Tasmania` = "TAS",
`Victoria` = "VIC",
`Western Australia` = "WA"
))
Now the function aggregate_key() can create hierarchical
time series from bottom level ie, from states to the country
Let’s try it out
tourism_hts <- tourism |>
aggregate_key(State / Region, Trips = sum(Trips))
tourism_hts
## # A tsibble: 6,800 x 4 [1Q]
## # Key: State, Region [85]
## Quarter State Region Trips
## <qtr> <chr*> <chr*> <dbl>
## 1 1998 Q1 <aggregated> <aggregated> 23182.
## 2 1998 Q2 <aggregated> <aggregated> 20323.
## 3 1998 Q3 <aggregated> <aggregated> 19827.
## 4 1998 Q4 <aggregated> <aggregated> 20830.
## 5 1999 Q1 <aggregated> <aggregated> 22087.
## 6 1999 Q2 <aggregated> <aggregated> 21458.
## 7 1999 Q3 <aggregated> <aggregated> 19914.
## 8 1999 Q4 <aggregated> <aggregated> 20028.
## 9 2000 Q1 <aggregated> <aggregated> 22339.
## 10 2000 Q2 <aggregated> <aggregated> 19941.
## # … with 6,790 more rows
tourism_hts |>
filter(is_aggregated(Region)) |>
autoplot(Trips) +
labs(y = "Trips ('000)",
title = "Australian tourism: national and states") +
facet_wrap(vars(State), scales = "free_y", ncol = 3) +
theme(legend.position = "none")
tourism_hts |>
filter(State == "NT" | State == "QLD" |
State == "TAS" | State == "VIC", is_aggregated(Region)) |>
select(-Region) |>
mutate(State = factor(State, levels=c("QLD","VIC","NT","TAS"))) |>
gg_season(Trips) +
facet_wrap(vars(State), nrow = 2, scales = "free_y")+
labs(y = "Trips ('000)")
From this graph we can understand, that in Queensland and Northern Territory, the tourism is on peak in Winter, In case of Southern states like Victoria and Tasmania tourism is high in Summer months
This plot shows us the things going onn within states and with some series showing strong trends or seasonality, some showing contrasting seasonality, while some series appear to be just noise.
Now lets understand grouped time series This type of time series comes into place where is no general hierarchical trend.
For example : Business trips and Vacation Trips
If we total up for all the series the last ones, we get the same if we would have totaled up at one level up
Let us understand this with an example Lets take australian prison data. We have un-grouped this into - Gender - State - Legal Status
prison <- readr::read_csv("https://raw.githubusercontent.com/chirag35847/TS-images/main/prison_population.csv") |>
mutate(Quarter = yearquarter(Date)) |>
select(-Date) |>
as_tsibble(key = c(Gender, Legal, State, Indigenous),
index = Quarter) |>
relocate(Quarter)
## Rows: 3072 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): State, Gender, Legal, Indigenous
## dbl (1): Count
## date (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Syntax for grouped time series is * * *
prison_gts <- prison |>
aggregate_key(Gender * Legal * State, Count = sum(Count)/1e3)
Prisnors for the country
prison_gts |>
filter(!is_aggregated(Gender), is_aggregated(Legal),
is_aggregated(State)) |>
autoplot(Count) +
labs(y = "Number of prisoners ('000)")
Now we have the same for various states
prison_gts |>
filter(!is_aggregated(Gender), !is_aggregated(Legal),
!is_aggregated(State)) |>
mutate(Gender = as.character(Gender)) |>
ggplot(aes(x = Quarter, y = Count,
group = Gender, colour=Gender)) +
stat_summary(fun = sum, geom = "line") +
labs(title = "Prison population by state and gender",
y = "Number of prisoners ('000)") +
facet_wrap(~ as.character(State),
nrow = 1, scales = "free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
When the data is both nested and crossed, For example : Purpose of travel : holiday, business, visiting family etc The same can be also divided geographically in hierarchical.
We use aggregate_key and follow the following syntax.
Now tourism_full contains 425 series, where 85 are
hierarchical, and 340 are crossed
tourism_full <- tourism |>
aggregate_key((State/Region) * Purpose, Trips = sum(Trips))
Lets look at the country’s purpose of traverlling
Now lets look at statwise purpose of travel
Forecasts of hierarchical or grouped time series involved selecting one level of aggregation and generating forecasts for that level.
It involves
Selecting one level of aggregation and generating forecasts for that level.
Then for obtaining set of coherent forecasts for the rest of the structure these forecasts are
A simple method for generating coherent forecasts
This approach involves first generating forecasts for each series at the bottom level, and then summing these to produce forecasts for all the series in the structure.
Summing these, we get ℎ-step-ahead coherent forecasts for the rest of the series:
For this approach:
no information is lost due to aggregation
bottom-level data can be quite noisy and more challenging to model and forecast.
Aim - national and state forecasts for the Australian tourism data without
disaggregations using regions or the purpose of travel
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide SA Business 135.
## 2 1998 Q2 Adelaide SA Business 110.
## 3 1998 Q3 Adelaide SA Business 166.
## 4 1998 Q4 Adelaide SA Business 127.
## 5 1999 Q1 Adelaide SA Business 137.
## 6 1999 Q2 Adelaide SA Business 200.
## 7 1999 Q3 Adelaide SA Business 169.
## 8 1999 Q4 Adelaide SA Business 134.
## 9 2000 Q1 Adelaide SA Business 154.
## 10 2000 Q2 Adelaide SA Business 169.
## # … with 24,310 more rows
tourism_states <- tourism |>
aggregate_key(State, Trips = sum(Trips))
tourism_states
## # A tsibble: 720 x 3 [1Q]
## # Key: State [9]
## Quarter State Trips
## <qtr> <chr*> <dbl>
## 1 1998 Q1 <aggregated> 23182.
## 2 1998 Q2 <aggregated> 20323.
## 3 1998 Q3 <aggregated> 19827.
## 4 1998 Q4 <aggregated> 20830.
## 5 1999 Q1 <aggregated> 22087.
## 6 1999 Q2 <aggregated> 21458.
## 7 1999 Q3 <aggregated> 19914.
## 8 1999 Q4 <aggregated> 20028.
## 9 2000 Q1 <aggregated> 22339.
## 10 2000 Q2 <aggregated> 19941.
## # … with 710 more rows
fcasts_state <- tourism_states |>
filter(!is_aggregated(State)) |>
model(ets = ETS(Trips)) |>
forecast()
# Sum bottom-level forecasts to get top-level forecasts
fcasts_national <- fcasts_state |>
summarise(value = sum(Trips), .mean = mean(value))
tourism_states |>
model(ets = ETS(Trips)) |>
reconcile(bu = bottom_up(ets)) |>
forecast()
## # A fable: 144 x 5 [1Q]
## # Key: State, .model [18]
## State .model Quarter Trips .mean
## <chr*> <chr> <qtr> <dist> <dbl>
## 1 ACT ets 2018 Q1 N(701, 7651) 701.
## 2 ACT ets 2018 Q2 N(717, 8032) 717.
## 3 ACT ets 2018 Q3 N(734, 8440) 734.
## 4 ACT ets 2018 Q4 N(750, 8882) 750.
## 5 ACT ets 2019 Q1 N(767, 9368) 767.
## 6 ACT ets 2019 Q2 N(784, 9905) 784.
## 7 ACT ets 2019 Q3 N(800, 10503) 800.
## 8 ACT ets 2019 Q4 N(817, 11171) 817.
## 9 ACT bu 2018 Q1 N(701, 7651) 701.
## 10 ACT bu 2018 Q2 N(717, 8032) 717.
## # … with 134 more rows
data |>
aggregate_key() |>
model() |>
reconcile() |>
forecast()
Begin with a tsibble object (here labelled
data) containing the individual bottom-level
series.
Define in aggregate_key() the aggregation structure
and build a tsibble object that also contains the aggregate
series.
Identify a model() for each series, at all levels of
aggregation.
Specify in reconcile() how the coherent forecasts
are to be generated from the selected models.
Use the forecast() function to generate forecasts
for the whole aggregation structure.
Top-down approaches involve first generating forecasts for the Total series , and then disaggregating these down the hierarchy.
Let denote a set of disaggregation proportions which determine how the forecasts of the Total series are to be distributed to obtain forecasts for each series at the bottom level of the structure. For example, for the hierarchy of below figure, using proportions , we get
Top-down forecasts can be generated using top_down() within the reconcile() function
for . Each proportion reflects the average of the historical proportions of the bottom-level series over the period relative to the total aggregate .
This approach is implemented in the top_down() function by setting method = “average_proportions”
for . Each proportion captures the average historical value of the bottom-level series relative to the average value of the total aggregate .
This approach is implemented in the top_down() function by setting method = “proportion_averages”.
Hierarchical Tree Diagram
where is the h-step-ahead initial forecast of the series that corresponds to the node which is levels above j, and is the sum of the h-step-ahead initial forecasts below the node that is levels above node j and are directly connected to that node. These forecast proportions dis-aggregate the h-step-ahead initial forecast of the Total series to get h-step-ahead coherent forecasts of the bottom-level series.
This approach is implemented in the top_down() function by setting method = “forecast_proportions”. It is the default method.
one disadvantage is the loss of information due to aggregation. we are unable to capture and take advantage of individual series characteristics such as time dynamics, special events, different seasonal patterns, etc.
The middle-out approach combines bottom-up and top-down approaches.
First, a “middle” level is chosen and forecasts are generated for all the series at this level. For the series above the middle level, coherent forecasts are generated using the bottom-up approach by aggregating the “middle-level” forecasts upwards. For the series below the “middle level”, coherent forecasts are generated using a top-down approach by disaggregating the “middle level” forecasts downwards.
The equations we seen till now, we can represented using the matrix notation. The bottom-level series’ aggregation is determined by a n m matrix S, often known as the “summing matrix,” which is created for each aggregation structure.
Example1
we can represent the above example into the matrix as follows:
or we can represent that into very small equation:
where y t is an n-dimensional vector of all the observations in the
hierarchy at time t, S is the summing matrix, and b t is an
m-dimensional vector of all the observations at the lowest level of the
hierarchy at time t. The first row of the summation matrix S reflects
, whereas the second and third rows represent
Similarly,for the given figure
The given matrix is as follows:
where the second and third rows of S represent Equation
This matrix notation enables us to use a single notation to express all forecasting methods for hierarchical or clustered time series.
Assume we anticipate all series without regard for any aggregation limitations. These are the basic predictions, and they are denoted by , where h is the forecast horizon. They are stacked in the same sequence as the data
Hence, for either hierarchical or grouped systems, all coherent forecasting techniques may be described as
where G is a matrix that transfers the base forecasts to the bottom level and S sums them together using the aggregation structure to obtain a set of coherent forecasts .
The G matrix is defined in accordance with the technique used. For
example, if the bottom-up technique is applied to anticipate the example
1 hierarchy, then
Take note that G has two partitions. The first three columns zero out the series’ base forecasts above the bottom level, while the m-dimensional identity matrix selects just the bottom level’s base forecasts. The S matrix then adds them all together.
If one of the top-down approach was adopted,
The first column contains the set of proportions that allocate the top level’s base predictions to the lower level. The S matrix then adds them all together. The remaining columns subtract the base projections from the highest degree of aggregation.
The G matrix for a middle out strategy will be a hybrid of the two above. The base predictions of a pre-selected level will be disaggregated to the bottom level using a set of proportions, all other base forecasts will be wiped out, and the bottom-level forecasts will then be summed up the hierarchy using the summing matrix.
This equation as discussed above shows that SG will return a set of coherent forecasts.
Traditional Methods include the base forecasts from a single level of aggregation which can be considered as usage of limited information. n
However we need to optimize all the G matrices such that when SG combines and reconciles all the base forecasts in order to produce coherent forecasts.
This is a way to find a G matrix which minimizes the total forecast variance of the set of coherent forecasts, leading to the MinT (Minimum Trace) optimal reconciliation approach.
Before, performing any steps ahead, we need to make sure that we have unbiased forecasts.
If is unbiased then is unbiased provided {SGS}={S}
IMPORTANT: No top-down method satisfies this constraint, so all top-down approaches result in biased coherent forecasts.
Finding errors: variance-covariance matrix of the h-step-ahead coherent forecast errors is given by
where
is the variance-covariance matrix of the corresponding base forecast
errors.
The objective is to find a matrix G that minimizes the error
variances of the coherent forecasts. These error variances are on the
diagonal of the matrix , and
so the sum of all the error variances is given by the trace of the
matrix .
The equation below shows that matrix G which minimizes the trace of such that , is given by
Therefore, the optimally reconciled forecasts are given by
MinT is implemented by min_trace() within the reconcile()
function.
We need to estimate , the
forecast error variance of the h-step-ahead base forecasts. This can be
difficult, and so we provide four simplifying approximations that have
been shown to work well in both simulations and in practice. Lets four
simplifying approximations that have been shown to work well in both
simulations and in practice.
Set for all h, where . This is the most simplifying assumption to make, and means that G is independent of the data, providing substantial computational savings. The disadvantage, however, is that this specification does not account for the differences in scale between the levels of the structure, or for relationships between series.
Setting in gives the ordinary least squares (OLS) estimator we introduced in with and . Hence this approach is usually referred to as OLS reconciliation. It is implemented in min_trace() by setting method = “ols”.
Set for all h, where ,
and is an n-dimensional vector of residuals of the models that generated the base forecasts stacked in the same order as the data. It referred as WLS (weighted least squares) estimator using variance scaling beacuase this specification scales the base forecasts using the variance of the residuals. The approach is implemented in min_trace() by setting method = “wls_var”.
Set for all h, where , , and 1 is a unit vector of dimension m(the number of bottom-level series). It is referred to as structural scaling because this estimator only depends on the structure of the aggregations, and not on the actual data. The approach is implemented in min_trace() by setting method = “wls_struct”.
Set for all h, where Here we only assume that the error covariance matrices are proportional to each other, and we directly estimate the full one-step covariance matrix . The most obvious and simple way would be to use the sample covariance. This is implemented in min_trace() by setting method = “mint_cov”.
However, for cases where the number of bottom-level series m is large compared to the length of the series T , this is not a good estimator. Instead we use a shrinkage estimator which shrinks the sample covariance to a diagonal matrix. This is implemented in min_trace() by setting method = “mint_shrink”.
The best reconciliation projections are produced using all the data available inside a hierarchical or grouped framework, unlike any other existing method. This is crucial because some aggregation levels or groupings may highlight data characteristics that are interesting to the user and crucial for modelling.
We will compute forecasts for the Australian tourism data . We use the data up to the end of 2015 as a training set, withholding the final two years (eight quarters, 2016Q1–2017Q4) as a test set for evaluation. The code below demonstrates the full workflow for generating coherent forecasts using the bottom-up, OLS and MinT methods.
The accuracy of the forecasts over the test set can be evaluated using the accuracy() function. We summarise some results using RMSE and MASE.
tourism_full <- tourism |>
aggregate_key((State/Region) * Purpose, Trips = sum(Trips))
fit <- tourism_full |>
filter(year(Quarter) <= 2015) |>
model(base = ETS(Trips)) |>
reconcile(
bu = bottom_up(base),
ols = min_trace(base, method = "ols"),
mint = min_trace(base, method = "mint_shrink")
)
fc <- fit |> forecast(h = "2 years")
fc |>
filter(is_aggregated(Region), is_aggregated(Purpose)) |>
autoplot(
tourism_full |> filter(year(Quarter) >= 2011),
level = NULL
) +
labs(y = "Trips ('000)") +
facet_wrap(vars(State), scales = "free_y")
fc |>
filter(is_aggregated(State), !is_aggregated(Purpose)) |>
autoplot(
tourism_full |> filter(year(Quarter) >= 2011),
level = NULL
) +
labs(y = "Trips ('000)") +
facet_wrap(vars(Purpose), scales = "free_y")
fc |>
filter(is_aggregated(State), is_aggregated(Purpose)) |>
accuracy(
data = tourism_full,
measures = list(rmse = RMSE, mase = MASE)
) |>
group_by(.model) |>
summarise(rmse = mean(rmse), mase = mean(mase))
## # A tibble: 4 × 3
## .model rmse mase
## <chr> <dbl> <dbl>
## 1 base 1721. 1.53
## 2 bu 3070. 3.16
## 3 mint 2158. 2.09
## 4 ols 1804. 1.63
Reconciled distributional forecasts are a type of probabilistic forecasting that involve reconciling or combining multiple individual forecasts, each with their own distributional assumptions, into a single coherent forecast with a distributional assumption that reflects the information from all the individual forecasts.
we will focus on two fundamental results that are implemented in the
reconcile() function.
$
then the reconciled forecasts are also normally distributed,
$
where, S = Summing matrix
G = Reconciliation matrix
When the assumption of normality is not reasonable for the base forecasts:
we can use a non-parametric approach such as bootstrapping to generate a large number of possible future sample paths from the model(s) that produce the base forecasts. These sample paths represent a range of possible future outcomes based on the underlying model(s) and provide a way to estimate the distribution of the forecast errors.
Once the sample paths are generated, we can then reconcile them using a reconciliation method such as hierarchical forecasting or Bayesian model averaging. This involves adjusting the individual sample paths to ensure that they are consistent with each other and with any known constraints or relationships between the variables being forecasted.
After the sample paths are reconciled, we can compute coherent prediction intervals that reflect the uncertainty in the reconciled forecast. One way to do this is to compute quantiles of the reconciled sample paths at each future point in time.
For example, if we generate 1,000 reconciled sample paths, we can compute the 2.5th and 97.5th percentiles of the sample paths at each future point to obtain a 95% prediction interval.
This approach allows us to account for the non-normality of the forecast errors and to incorporate the information from the base forecasts into a single, coherent forecast that reflects the collective knowledge and information of the individual forecasts.
Returning to the Australian prison population data (Section 11.1), we will compare the forecasts from bottom-up and MinT methods applied to base ETS models
prison <- readr::read_csv("https://raw.githubusercontent.com/chirag35847/TS-images/main/prison_population.csv") |>
mutate(Quarter = yearquarter(Date)) |>
select(-Date) |>
as_tsibble(key = c(Gender, Legal, State, Indigenous),
index = Quarter) |>
relocate(Quarter)
## Rows: 3072 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): State, Gender, Legal, Indigenous
## dbl (1): Count
## date (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
prison_gts <- prison |>
aggregate_key(Gender * Legal * State, Count = sum(Count)/1e3)
fit <- prison_gts |>
filter(year(Quarter) <= 2014) |>
model(base = ETS(Count)) |>
reconcile(
bottom_up = bottom_up(base),
MinT = min_trace(base, method = "mint_shrink")
)
fc <- fit |> forecast(h = 8)
fc |>
filter(is_aggregated(State), is_aggregated(Gender),
is_aggregated(Legal)) |>
autoplot(prison_gts, alpha = 0.7, level = 90) +
labs(y = "Number of prisoners ('000)",
title = "Australian prison population (total)")
fc |>
filter(
.model %in% c("base", "MinT"),
!is_aggregated(State), is_aggregated(Legal),
is_aggregated(Gender)
) |>
autoplot(
prison_gts |> filter(year(Quarter) >= 2010),
alpha = 0.7, level = 90
) +
labs(title = "Prison population (by state)",
y = "Number of prisoners ('000)") +
facet_wrap(vars(State), scales = "free_y", ncol = 4) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
fc |>
filter(is_aggregated(State), is_aggregated(Gender),
is_aggregated(Legal)) |>
accuracy(data = prison_gts,
measures = list(mase = MASE,
ss = skill_score(CRPS)
)
) |>
group_by(.model) |>
summarise(mase = mean(mase), sspc = mean(ss) * 100)
## # A tibble: 3 × 3
## .model mase sspc
## <chr> <dbl> <dbl>
## 1 base 1.72 55.9
## 2 bottom_up 1.84 33.5
## 3 MinT 0.895 76.8